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We discuss the problem of parameter estimation in nonlinear stochastic differential equations 
based on sampled time series. A central message from the theory of integrating stochastic differential 
equations is that there exists in general two time scales, i.e. that of integrating these equations 
and that of sampling. We argue that therefore maximum likelihood estimation is computational 
extremely expensive. We discuss the relation between maximum likelihood and quasi maximum 
likelihood estimation. In a simulation study, we compare the quasi maximum likelihood method with 
an approach for parameter estimation in nonlinear stochastic differential equations that disregards 
the existence of the two time scales. 
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I. INTRODUCTION 

The analysis of complex systems by nonlinear deter- 
ministic differential equations has attracted much atten- 
tion in recent years. Given a parameterized differential 
equation, best-fit parameters can be obtained by a least- 
squares minimization, see e.g. 

Often, however, the dynamics does not follow a 
strict deterministic law. In dissipative systems, due 
to fluctuation-dissipation theorems, thermal noise might 
have to be taken into account j^. Furthermore, in open 
nonequilibrium systems like in biology and economy, the 
dynamics is often also exposed to highdimensional, effec- 
tively random, influences, see for an example. This 
calls for a description by nonlinear stochastic differen- 
tial equations (SDE), respectively their corresponding 
Fokker-Planck equations |^-|^. While parameter esti- 
mation in linear stochastic and nonlinear deterministic 
differential equations even for data covered by additive 
observational noise is well known, see ^ for a review, the 
estimation of parameters in nonlinear SDEs is still under 
development. In this paper we discuss three approaches 
for parameter estimation in nonlinear SDEs. The results 
carry over to modeling by Fokker-Planck equations. 

The organization of this paper is as follows: In the 
next section, we briefly summarize the theory of inte- 
grating SDEs and exemplify its practice for the van der 
Pol oscillator undergoing stochastic forcing. In Section 

III we discuss three approaches for parameter estimation 
in nonlinear stochastic differential equations. In Section 

IV we compare two of these approaches in a simulation 
study using the stochastic van der Pol oscillator. 



II. INTEGRATING NONLINEAR STOCHASTIC 
DIFFERENTIAL EQUATIONS 

A. Theory 

A stochastic differential equation (SDE) with parame- 
ter vector 9 is given by: 



(1) 



where e denotes uncorrelated Gaussian noise or, in math- 
ematical terms, the increment of Brownian motion. This 
general form includes additive and multiplicative dynam- 
ical noise. d{6,x) is usually denoted as (deterministic) 
drift term, b(9, x) is called the diffusion term. 

The integration of a SDE is not straightforward. This 
is due to the mathematical problems of evaluating in- 
tegrals which involves the dynamical noise e, see pO| ] 
for a brief introduction and for a detailed discus- 
sion. Applying the same ideas underlying higher order in- 
tegration schemes for deterministic differential equation 
like Runge-Kutta to SDEs leads to hardly treatable 
stochastic integrals given in Thus, only low order 



integration schemes can be used. The lowest order so- 
called Euler-scheme for Eq. (j^) is given by : 



(2) 



x{t + St) = x{t) + St a{9, x{t)) + VStb{e, x{t))e{t) 



which is of order 1/2 for multiplicative noise and of order 
1 for additive noise. The characteristics of integration 
schemes for SDEs is the VSi term which results from the 
integration rules for white noise . 

For the task of parameter estimation, we assume that 
the system under observation can be adequately de- 
scribed by Eq. (^ with unknown parameter vector 9. 
The process is observed at discrete time points given by 
the sampling interval At. For nonlinear deterministic dif- 
ferential equations it is usually possible to chose identi- 
cal time steps for the integration and the sampling. The 
necessity of using a low order scheme for SDEs means 
that the integration step size St is usually much smaller 
than the sampling interval At by which the time series is 
recorded. Thus, while Eq. (H) can be written as 



x{t + St) = h{x{t)) + v(t) 



(3) 



with h{x{t)) a function that can be related back to the 
parameter vector 9 of a{9,x{t)) and ^{t) uncorrelated 
Gaussian noise, on the time scale At of sampling the 
relationships are more intricate. While it is still possible 
to formulate the dependence between present and future 
states as 



x{t + At) = g{x{t)) + T]{t) 



(4) 



the parameter vector 9 of d{9,x(t)) can, in general, not 
be inferred from g{x{t)). Furthermore, 77(t), while still 
uncorrelated with zero mean, does in general not repre- 
sent Gaussian noise. In other words, in nonlinear SDEs 
the relation between the mean of the conditional den- 
sity p{x{t + At)\x{t)) and the drift-term d{9, x{t)) is not 
explicitly known (The analogous problem is given for 
modeling such systems by their corresponding Fokker- 
Planck equation). Furthermore, the conditional density 
p{x{t+At)\x{t)) is, in general, not Gaussian although e{t) 
in Eqs. (|l|,|) is Gaussian. For these reasons parameter 
estimation in SDEs is a non-trivial task. 



B. An example 

To exemplify the practical issues of integrating SDEs, 
we choose the van der Pol oscillator lO : 



fj.{l — x'^) X - 



(5) 



This system exhibits a limit cycle due to the amplitude 
dependent change of the sign of the damping term. We 
expose it to a random force of unit variance, leading to 
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±2 = (1 - xl)x2 - xi + e 



(6) 
(7) 



C. A consequence 



where xi denotes the location and X2 the velocity. The 
Euler integration scheme for Eqs. is given by 



Xl{t 
X2{t 



St) = xi{t) 

5t)=X2{t) 



5tX2[t) 

St {fl{l-xl{t))x2{t) 



In the following we chose ^ = 3. To obtain a sampled 
time series of the system, one has to decide on St and At. 
At should be chosen so that the process is reasonably 
sampled. Characteristics of the stochastic van der Pol 
oscillator and related perturbed limit cycles have been 
investigated in ||lj,|l5|. These authors have shown that 
the mean period of this system is slightly smaller than the 
period of the corresponding deterministic system which 
is approximately 9 s. By choosing At = 0.5, we obtain 
approximately 18 data points per mean period. 

The choice of the integration step size St is more dif- 
ficult. For deterministic systems adaptive algorithms 
are well established that guarantee an upper bound for 
the deviation from the true trajectory ||T^. No corre- 
sponding straightforward procedure is available for SDKs. 
For SDEs the characteristic quantities are the condi- 
tional densities p{x{t + At)\x{t)). To obtain a sampled 
trajectory that can be regarded as a realization of the 
SDE, St has to be chosen such small that the conditional 
densities become independent of St for smaller values. 
Figs. |l| and ^ shows this procedure for the first com- 
ponent of x{t + At) of the stochastic van der Pol os- 
cillator denoted by xi{t + At) for the two state vectors 
cc{t) = (-0.0935,-4.284) and x{t) = (1.021,-0.9375). 
The conditional densities p{xi{t + At)\xi{t)) were esti- 
mated by triangular kernel estimators ||l^ based on 5000 
integrations of the SDE. The quality of these estimated 
densities with respect to bias and variance depends on the 
width of the kernel. By visual inspection, the width of 
the kernel was chosen equal to 0.02 in Fig. |l| and equal to 
0.1 in Fig. H In Fig. |], the estimated conditional density 
changes drastically between St = 0.1s and St = O.Qls. It 
becomes independent from St for St = 0.001s. In Fig. ||, 
this already happens for St — 0.01s. The reason for the 
different behavior is that the time evolution in the first 
case experiences more of the nonlinearity of the system. 
By investigating numerous analogous simulations, we re- 
garded St = 0.001s as an appropriate integration time 
step. The two state space vectors x{t) used for the above 
simulation were realizations of the trajectory obtained 
with St = 0.001s. Thus, the procedure for determining 
St is an interactive one that has to be selfconsistent. 

Fig. H shows a realization of the stochastic van der Pol 
oscillator with fi = 3, St — 0.001 and At = 0.5. The two 
data points marked by arrows at time 28s, respectively 
28.5s were used for the estimation of the conditional den- 
sities shown in Figs. |l| and ^j. 



The above considerations also have consequences for 
mathematical models used in simulation studies and pro- 
posed for analyzing time series in the frame of nonlinear 
stocl^g^tic systems by difference equation of the type : 

xi{t)) + VSte{t) x{1(gf)l)^ fip,x{t),xit-l),...,xit~m)) + e{t) . 

(10) 



Here the sampling interval is set to unity, e{t) denotes 
uncorrelated Gaussian noise, p the parameters and m 
the order of the model, see e.g. |l^]. For these models 
the conditional distribution of a; (i -I- 1) given the history 
is Gaussian. Due to the Gaussianity, least squares op- 
timization leads to a maximum likelihood estimation of 
the parameters which is asyrnptotically unbiased and ef- 
ficient ||l^,^. Figs. |l| and g show that for difference 
equations that are thought to be discretized versions of 
SDEs, the dynamical noise should be non-Gaussian, see 
the skewed distribution in Fig. ^, and state dependent 
heteroscadistic. It might even be multimodal. 



III. PARAMETER ESTIMATION 

In this section we discuss three methods to estimate 
parameters in SDEs. Due to its superior statistical prop- 
erties, the most desirable method would be a maximum 
likelihood estimation [ p^jT9| . We argue that this ap- 
proach is not feasible. Then we discuss recently sug- 
gested quasi-maximum likelihood approaches |20|. The 
third approach applies the integration scheme, Eq. (H), 
for the whole sampling interval by using At — St. With 
respect to the identification of the two time scales this 
last approach is similar to a procedure to estimate pa- 
rameters in SDEs proposed in plj-psf. In the simulation 
studies presented in these publications the time series 
were sampled at the time step of integration. 

To simplify the exposition, we consider a scalar dy- 
namics and a single parameter 6 in the following. 



A. Maximum likelihood estimation 

Denoting the stationary distribution of the SDE for a 
given parameter hy tt{x\9), the likelihood for a sampled 
time series of length N reads : 

N-l 

£{x{ti),x{t2), x{tN);6) = T:(x{ti)\e) W p{x{t,+i)\x{ti), 9) 

i=l 

(11) 

Maximizing C{.;9) leads to an estimator 6 that is 
asymptotically unbiased and has least conservative con- 
fidence regions. Note that a biased estimator can lead 
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to erroneous interpretations of results, while suboptimal 
confidence regions 'only' lowers the power to reject sim- 
pler models in favor of more complex ones, but does not 
lead to statistically false positive results. 

Usually, the logarithm of the likelihood is considered 
and the term 7T{x{ti)\0) whose influence vanishes asymp- 
totically is neglected: 

JV-l 

L{xit2),x{t3),...,xitN)\0)^ ^\np{x{t,+i)\x{U),e) . 

i=\ 

(12) 

The maximum likelihood estimator dmie is defined by: 

^^L{x(t2),x{H),...,x{t^)-Q)^Q . (13) 

To obtain Omie an iterative optimization strategy has 
to be applied. Starting from an initial guess for the pa- 
rameter, the conditional densities p{x(ti+i)\x{ti),d) in 
Eq. (HI) have to be estimated and evaluated. This can 
be achieved by solving the corresponding Fokker-Planck 
equation or by applying the integration scheme, Eq. (||), 
with the actual guess of the parameters several times on 
the intervals [ti,ti+i] starting from the observed x{ti) to 
estimate the conditional density, e.g. by kernel estimation 
p6t . Then, the log- likelihood can be calculated. Based 
on this procedure, the parameter is changed until the log- 
likelihood is extremal by applying some optimization al- 
gorithm . The performance of this procedure depends 
heavily on the quality of the density estimation. Thus, for 
the procedure based on integrating the SDE thousands 
of trajectories in each interval [ti-^ , ti] have to be real- 
ized. Note that the densities shown in Figs. and || that 
were based on 5000 realizations are not smooth enough 
to enable a numerically stable estimation of the param- 
eters. Furthermore, this procedure involves the choice of 
a parameter determining the bandwidth of the density 
estimator. This parameter has to be chosen data driven 
and state dependent, e.g. by a computer intensive cross- 
validation procedure [|6|j2jj2^. Therefore, the desirable 
method of maximum likelihood is hardly applicable. 



B. Quasi maximum likelihood estimation 

Bibby and Sorensen [ pO| suggested to use a quasi max- 
imum likelihood estimator instead of the infeasible max- 
imum likelihood estimator, _Eq. (|l^). The key idea of 



this procedure is that Eq. (13) can formally be read as a 
search for a root : 



G{d) = 



(14) 



defining an estimator 9. By virtue of choosing G{9), 
the resulting computationally feasible estimator can be 
forced to be unbiased by paying the price that the result- 
ing confidence region are no longer optimal. Experience 



shows that the loss in optimality is often rather small [g6j . 
Eq. ( |l^ ) is called estimation equation, and the resulting 
estimator is called quasi maximum likelihood estimator. 

Bibby and Sorensen have shown that in the case 
of SDEs one possible choice for G{d) is given by: 



N~l 



G{0) = - E{x{t,+i)\x{U),e)) , (15) 



where gi{.) is a function that is derived from the SDE 
and E{x{ti+i)\x{ti),9) denotes the expected value of 
x{ti+i) conditioned on the observed x{ti) for a given 9. 
Different possible procedures for deriving gi{.) given the 
SDE are discussed in [pO|. 

In opposite to the conditional density necessary for the 
maximum likelihood case, the conditional expectation in 
Eq. (|l^) can the estimated reliably with a comparable 
small numbers of integrations in the intervals 
starting from the observed x{ti). 



C. "At = 5t" approach 

In the third approach we use the discretization scheme, 
Eq. (^) , on the interval of the whole sampling time step 
ISi by setting Ai = 5t in Eq. (||) . This yields an estimate 
x{{tij^i)\x{ti),6). The parameters are adjusted until the 
mean square error 



Af-l 



Y,{x{U+i) -x{{U+i)\x{U),9)f 



(16) 



i=l 



is minimized. This method neglects the existence of the 
two time scales for integrating and sampling SDEs. Fur- 
thermore, it implicitly assumes that the variance of the 
conditional density is state space independent and Gaus- 



IV. SIMULATION STUDY 

We investigate the behavior of the quasi maximum like- 
lihood and the " At — 6t" approach in a simulation study 
using t he st ochastic van der Pol oscillator introduced in 
Section II B. We integrated the process with 5t = O.OOls. 
The smaller the sampling time, the smaller the differences 
between h{.) in Eq. (|) and g{.) in Eq. (|) will be. Thus, 
the less severe the effects of ignoring the two time scales 
by the "At = St" approach will be ||3-i9|- Therefore, we 
investigate the behavior of the two approaches described 
above for different sampling times between At = 0.005s 
and At — 0.5s. As in the simulation study presented 
in ||2^, we assume that the whole state space vector is 
observed. 

Fo r the quasi maximum likelihood approach. Section 
IIIB , we chose the term gi{.) in Eq. ( [l5|) to be: 
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gi{x{ti)) = (1 - x\) X2 



(17) 



generalizing the line of argument of ||20|. The term 
"{x{U+i) - E{xiU+i)\x{U),ey in Eq. (|l5l) is read as 
{xi{ti+i) — E {xi{ti^i)\x{ti) , 9). Thus, we predict the ob- 
served location xi{ti^i) based on the whole state space 
vector at the present time step ti. The expected value of 
xi{ti+i) is estimated based on 50 integrations. Finding 
the zero of Eq. (^5|) is performed by ro utines from |^ . 
For the " Ai = df^ approach, Section [II C , we use the 



information from the whole state space from the history 
and from the present state. The minimization is perform 
by routines from |l2[ . 

Fig. ^ displays the results of the simulation study for 
sampling time steps At ranging from 0.005 to 0.5. The 2cr 
confidence intervals were calculated from 50 repetitions. 
The length of the time series for different sampling times 
were chosen such that 1000 data points enters the esti- 
mation. The quasi maximum likelihood approach yields 
unbiased results independent from the sampling interval. 
For realistic sampling intervals the "Ai — St" approach 
gives strongly biased results. Only for a sampling time 
of At < 0.025 its estimate is consistent with the true pa- 
rameter. This would require to sample the process with 
approximately 360 points per period. Note that classi- 
cal time series like the sunspot data and the Canadian 
lynx data are sampled with 11, respectively 10 points per 
period, see e.g. poj . 

Fig. ^ shows the results analogous to Fig. ^ for nonlin- 
earity parameter /i = 5. The relative bias of the "At ~ 
51" approach for sampling intervals 5t = 0.05s, 0.1s and 
0.5s is larger than for /z = 3 due to the higher degree of 
nonlinearity. Thus, in general, a sufficient sampling for 
the linearly approximating " At = 5t" approach to work 
depends on the degree of nonlinearity. 



V. DISCUSSION 

Modeling time series of open nonequilibrium systems 
by nonlinear stochastic differential equations (SDE) al- 
lows to take into account the effects of the huge number 
of degrees of freedom that might be active in such sys- 
tems. A fundamental problem in estimating parameters 
in SDEs is taught by the theory of integrating nonlinear 
stochastic differential equations. A central message from 
this theory is that there are in general two time scales: 
that of integration and that of sampling. As a conse- 
quence, the mean and the distributional characteristics 
of conditional densities on the time scale of sampling can, 
in general, not be related back to the parameters of the 
stochastic differential equation (SDE). 

We discussed three approaches for parameter estima- 
tion in SDEs based on sampled time series. Unfortu- 
nately, the desirable maximum likelihood approach is not 
feasible in the present context due to its extreme compu- 
tational burden to estimate the conditional distributions. 



It might become tractable with the advent of more pow- 
erful massive parallel computers. In the presented sim- 
ulation study, the quasi maximum likelihood approach 
yielded unbiased estimates for the parameter. Disregard- 
ing the existence of the two time scales of integration and 
of sampling in SDEs and using the discretization scheme 
of the SDE itself on the time scale of the sampling led to 
results that are unbiased only for the case of heavy over- 
sampling, but biased for conventional sampling rates. 

Thus, methods that require the sampling time interval 
to be an admissable integration time step, like suggested 
in 1^,^ , arc applicable to measured time series only for 
sufficiently sampled data. The same holds for the de- 
sirable approach to nonparametrically estimate the com- 
plete functional form of the deterministic drift term of a 
stochastic differential equation based on the discretiza- 
tion of the corresponding Fokker-Planck equation pro- 
posed in where formally even the limit "sampling 
interval — > 0" is required. 

The approaches discussed in this paper require to ob- 
serve the complete state space vector which is usually not 
possible. Our future work will concentrate on the gener- 
alization to the case of parameter estimation in nonlinear 
stochastic differential equations based on scalar observa- 
tions as it is already possible in linear stochastic and 
nonlinear deterministic systems 1^. 



[1] H. Bock, in Progress in Scientific Computing, edited by 
P. Deuflhard and E. Hairer (Birkhauser, Boston, 1983), 
Vol. 2, pp. 95-121. 

[2] G. Gouesbet, Phys. Rev. A 43, 5321 (1991). 

[3] E. Baake, M. Baake, H. Bock, and K. Briggs, Phys. Rev. 
A 45, 5524 (1992). 

[4] E. Baake and J. Schloder, Bull. Math. Biol. 54, 999 

(1992) . 

[5] J. Timmer, Int. J. Bif. Chaos 8, 1505 (1998). 

[6] J. Keizer, R. Fox, and J. Wagner, Phys. Lett. A 175, 17 

(1993) . 

[7] C. Gardiner, Handbook of Stochastic Methods (Springer, 
Berlin, 1997). 

[8] N. van Kampen, Stochastic Processes in Physics and 

Chemistry (Springer, Berlin, 1997). 
[9] J. Honerkamp, Stochastic Dynamical Systems (VCH, 

New York, 1993). 
[10] P. Kloeden, E. Platen, and H. Schurz, Int. J. Bif. Chaos 

1, 277 (1991). 

[11] P. Kloeden and E. Platen, Numerical solution of stochas- 
tic differential equations, Vol. 23 of Applications of Math- 
ematics (Springer, New York, 1992). 

[12] W. Press, B. Flannery, S. Saul, and W. VetterUng, 
Numerical Recipes (Cambridge University Press, Cam- 
bridge, 1992). 

[13] B. van der Pol, Phil. Mag. 43, 177 (1922). 

[14] C. Kurrer and K. Schulten, Physica D 50, 311 (1991). 



5 



[15] H. Leung, Physica A 221, 340 (1995). 
[16] B. Silverman, Density Estimation (Chapman and Hall, 
London, 1986). 

[17] H. Tong, Non-linear Time Series : A Dynamical Sys- 
tem Approach, Vol. 6 of Oxford statistical science series 
(Clarendon Press, Oxford, 1996). 

[18] E. Lehmann, Theory of point estimation, Chapt. 6.2 
(Wadsworth Inc., New York, 1991). 

[19] A. Azzalini, Statistical inference - based on the likelihood 
(Chapman and Hall, London, 1996). 

[20] B. Bibby and M. S0rensen, Bernoulh 1, 17 (1995). 

[21] L. Borland and H. Haken, Z. Phys. B - Cond. Matt. 88, 
95 (1992). 

[22] L. Borland and H. Haken, Ann. Phys. 1, 452 (1992). 
[23] S. Siegert, R. Friedrich, and J. Peinke, Phys. Lett. A 243, 
275 (1998). 

[24] T. Gasser and M. Rosenblatt, Smoothing techniques for 
curve estimation, No. 757 in Lecture Notes in Mathemat- 
ics (Springer, New York, 1979). 

[25] P. Craven and G. Wahba, Numerische Mathematik 31, 
317 (1979). 

[26] C. Heyde, Quasi-likelihood and its Application (Springer, 

New York, 1997). 
[27] D. Dacunha-Castelle and D. Florens-Zmirou, Stochastics 

19, 263 (1986). 
[28] D. Florens-Zmirou, Statistics 4, 547 (1989). 
[29] 1. Shoji and T. Ozaki, J. Time Series Analysis 18, 487 

(1997). 

[30] P. Brockwell and R. Davis, Time Series: Theory and 
Methods (Springer, New York, 1991). 



FIGURE CAPTIONS 

Fig. |l| Estimated conditional densities p{xi{t + At)\x(t)) 
for the stochastic van der Pol oscillator for /i = 3, 
x{t) = (—0.0935,-4.284), integration time steps 
5t = 0.1s, 0.01s, 0.001s, 0.0001s and sampling in- 
terval At = 0.5s. 

Fig. ^ Estimated conditional densities p{xi{t -f At)\x{t)) 
for the stochastic van der Pol oscillator for /i = 3, 
x{t) = (1.021,-0.9375), integration time steps 
5t = 0.1s, 0.01s, 0.001s, 0.0001s and sampling in- 
terval At — 0.5s. 

Fig. ^ Realization of the stochastic van der Pol oscillator 
for /I = 3, integration time step St = 0.001s and 
sampling interval At = 0.5s. The two data points 
marked by arrows at time 28s, respectively 28.5s 
were used for the estimation of the conditional den- 
sities shown in Figs. |l| and ||. 

Fig. ^ Dependence of the estimated parameter p, of the 
stochastic van der Pol oscillator on the sampling 
interval. The integration time step St was 0.001s. 
The 95% confidence intervals were calculated based 
on 50 repetitions. 4-: quasi maximum likeli- 
hood approach, *: "At = St" approach. For 
sake of clearity the results are slightly scattered 



around the applied sampling time steps At = 
0.005s, 0.01s, 0.025s, 0.05s, 0.1s and 0.5s. The true 
value of the parameter /i = 3 is marked by the solid 
line. 

Fig. ^ Analogous to Fig. ^ for /i = 5. 
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